Goto

Collaborating Authors

 real-time search


Exploring Unknown Environments with Real-Time Search or Reinforcement Learning

Neural Information Processing Systems

Learning Real-Time A* (LRTA*) is a popular control method that interleaves plan(cid:173) ning and plan execution and has been shown to solve search problems in known environments efficiently. In this paper, we apply LRTA * to the problem of getting to a given goal location in an initially unknown environment. Uninformed LRTA * with maximal lookahead always moves on a shortest path to the closest unvisited state, that is, to the closest potential goal state. This was believed to be a good exploration heuristic, but we show that it does not minimize the worst-case plan-execution time compared to other uninformed exploration methods. This result is also of interest to reinforcement-learning researchers since many reinforcement learning methods use asynchronous dynamic programming, interleave planning and plan execution, and exhibit optimism in the face of uncertainty, just like LRTA *.


'An Absolute Monster Bluffer' -- Facebook & CMU AI Bot Beats Poker Pros

#artificialintelligence

Don't simply "all in" if there's a bot at your Texas hold'em poker table, because Facebook and Carnegie Mellon University's new Pluribus AI system just beat five human pros at the same time -- including a couple of World Series of Poker Champs. AI models had already bettered human poker pros one-on-one, but Pluribus's success in a six-player game signals a huge leap in ability. Texas hold'em is one of the most popular poker variants that involves game theory, gambling, and strategy. To win the game, each play must assemble the best five cards from any combination of two "hole cards" dealt face down to each player and five community cards dealt face up. Players can choose to check, bet, call, raise, and fold. Researchers regard poker as a meaningful and complex experimental field where they can explore how AI interacts with gaming theory and imperfect information.


Improved Safe Real-Time Heuristic Search

AAAI Conferences

Empirically, this optimization lead to 0.5 - 2.5% savings on expansions in our experiments A fundamental concern in real-time planning is the presence (Cserna, Gall, and Ruml 2019). of dead-ends in the state space, from which no goal is reachable. SafeRTS interleaves exploration and safety proofs during Providing real-time heuristic search algorithms that are its planning phase. As a direct consequence, it attempts complete in domains with dead-end states is a challenging safety proofs on nodes that become internal to the LSS by problem. Recently, the SafeRTS algorithm was proposed for the end of the search iteration. As shown in Cserna, Gall, and searching in such spaces (Cserna et al. 2018). SafeRTS exploits Ruml (2019), it would be equally or less difficult to achieve a user-provided predicate to identify safe states, from the same or better safety coverage by doing safety proofs after which a goal is likely reachable, and attempts to maintain a all the LSS expansions. SafeRTS has an anytime behavior backup plan for reaching a safe state at all times.


Real-Time Heuristic Search in Dynamic Environments

AAAI Conferences

PLRTA* conflates all states that differ only in time into a single abstract state. Abstract states inherit the union of all In dynamic environments such as cities, agents often do not the predecessors of their preimage states, so that backups have time to find a complete plan to reach a goal state. Planning can be performed properly. PLRTA* learns a single static in such environment requires an agent to update its plan heuristic value for each abstract state. For dynamic learning, frequently to respond to the changes around it. The setting PLRTA* performs the standard Dijkstra-style backup across of real-time heuristic search models online planning by requiring the LSS, considering only costs arising from the dynamic elements the agent to commit to its next action within a strict of the environment. As presented by Cannon, Rose, time limit. The time bound for planning is set to the time and Ruml (2014), the algorithm commits to only one step at which the actions to which the agent has already committed along the selected path, and then replans using updated information.


Improved Safe Real-time Heuristic Search

arXiv.org Artificial Intelligence

A fundamental concern in real-time planning is the presence of dead-ends in the state space, from which no goal is reachable. Recently, the SafeRTS algorithm was proposed for searching in such spaces. SafeRTS exploits a user-provided predicate to identify safe states, from which a goal is likely reachable, and attempts to maintain a backup plan for reaching a safe state at all times. In this paper, we study the SafeRTS approach, identify certain properties of its behavior, and design an improved framework for safe real-time search. We prove that the new approach performs at least as well as SafeRTS and present experimental results showing that its promise is fulfilled in practice.


Achieving Goals Quickly Using Real-time Search: Experimental Results in Video Games

Journal of Artificial Intelligence Research

In real-time domains such as video games, planning happens concurrently with execution and the planning algorithm has a strictly bounded amount of time before it must return the next action for the agent to execute. We explore the use of real-time heuristic search in two benchmark domains inspired by video games. Unlike classic benchmarks such as grid pathfinding and the sliding tile puzzle, these new domains feature exogenous change and directed state space graphs. We consider the setting in which planning and acting are concurrent and we use the natural objective of minimizing goal achievement time. Using both the classic benchmarks and the new domains, we investigate several enhancements to a leading real-time search algorithm, LSS-LRTA*. We show experimentally that 1) it is better to plan after each action or to use a dynamically sized lookahead, 2) A*-based lookahead can cause undesirable actions to be selected, and 3) on-line de-biasing of the heuristic can lead to improved performance. We hope this work encourages future research on applying real-time search in dynamic domains.


Metareasoning in Real-Time Heuristic Search

AAAI Conferences

Real-time heuristic search addresses the setting in which planning andacting can proceed concurrently. We explore the use of metareasoning at two decision points within a real-time heuristic search. First, if the domain has an `identity action' that allows the agent to remain in the same state and deliberate further, when should this action be taken? Second, given a partial plan that extends to the lookahead frontier, to how many actions should the agent commit? We show that considering these decisions carefully can reduce the agent's total time taken to arrive at a goal in several benchmark domains, relative to the current state-of-the-art. The resulting algorithm can dynamically adjust the way it interleaves planning and acting, between greedy hill-climbing and A*, depending on the problem instance.


Non-Classical Planning for Robotic Applications

AAAI Conferences

For my dissertation I am focusing on non-classical planning for robotic applications. Much classical planning research relies on assumptions that do not hold in real world robotics applications. In many cases the entire world state is not known in advance and the events that occur in the future can not be known with certainty. Robots operating in the real world also need to be responsive and react to dynamic obstacles and events.


Reconnecting with the Ideal Tree: An Alternative to Heuristic Learning in Real-Time Search

AAAI Conferences

In this paper, we present a conceptually simple, easy-to-implement real-time search algorithm suitable for a priori partially known environments. Instead of performing a series of searches towards the goal, like most Real-Time Heuristic Search Algorithms do, our algorithm follows the arcs of a tree T rooted in the goal state that is built initially using the heuristic h. When the agent observes that an arc in the tree cannot be traversed in the actual environment, it removes such an arc from T and our algorithm carries out a reconnection search whose objective is to find a path between the current state and any node in T. The reconnection search need not be guided by $h$, since the search objective is not to encounter the goal. Furthermore, h need not be updated. We implemented versions of our algorithm that utilize various blind search algorithms for reconnection. We show experimentally that these implementations significantly outperform state-of-the-art real-time heuristic search algorithms for the task of pathfinding in grids. In grids, our algorithms, which do not incorporate any geometrical knowledge, naturally behaves similarly to a bug algorithm, moving around obstacles, and never returning to areas that have been visited in the past. In addition, we prove theoretical properties of the algorithm.


Learning in Real-Time Search: A Unifying Framework

arXiv.org Artificial Intelligence

Real-time search methods are suited for tasks in which the agent is interacting with an initially unknown environment in real time. In such simultaneous planning and learning problems, the agent has to select its actions in a limited amount of time, while sensing only a local part of the environment centered at the agents current location. Real-time heuristic search agents select actions using a limited lookahead search and evaluating the frontier states with a heuristic function. Over repeated experiences, they refine heuristic values of states to avoid infinite loops and to converge to better solutions. The wide spread of such settings in autonomous software and hardware agents has led to an explosion of real-time search algorithms over the last two decades. Not only is a potential user confronted with a hodgepodge of algorithms, but he also faces the choice of control parameters they use. In this paper we address both problems. The first contribution is an introduction of a simple three-parameter framework (named LRTS) which extracts the core ideas behind many existing algorithms. We then prove that LRTA*, epsilon-LRTA*, SLA*, and gamma-Trap algorithms are special cases of our framework. Thus, they are unified and extended with additional features. Second, we prove completeness and convergence of any algorithm covered by the LRTS framework. Third, we prove several upper-bounds relating the control parameters and solution quality. Finally, we analyze the influence of the three control parameters empirically in the realistic scalable domains of real-time navigation on initially unknown maps from a commercial role-playing game as well as routing in ad hoc sensor networks.